ggplot2(3)-条形图

数据集

1
2
3
library(ggplot2)
library(gcookbook)
head(pg_mean)

数据如下所示:

1
2
3
4
5
> head(pg_mean)
group weight
1 ctrl 5.032
2 trt1 4.661
3 trt2 5.526

简单绘图

1
ggplot(pg_mean,aes(x=group, y=weight))+geom_bar(stat="identity")

代码解释:geom_bar()表示绘制的是条形图,geom是图形的意思,例如点、线、多边形等。

x轴变量类型

条形图的x轴变量是离散型变量,如果数据集中的是连续型变量,则需要用factor()转化为因子型变量,以下是未转化前的图形:

1
2
3
4
5
6
7
8
9
10
11
12
13
> BOD
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
str(BOD)
## 'data.frame': 6 obs. of 2 variables:
## $ Time : num 1 2 3 4 5 7
## $ demand: num 8.3 10.3 19 16 15.6 19.8
## - attr(*, "reference")= chr "A1.4, p. 270"

绘图:

1
2
ggplot(BOD, aes(x=Time, y=demand)) + geom_bar(stat="identity")
# 绘图函数里的stat参数表示对样本点做统计的方式,默认为identity,表示一个x对应一个y,同时还可以是bin,表示一个x对应落到该x的样本数。”说白了就是,identity提取横坐标x对应的y值,bin提取横坐标x的频数

现在将Time转化为factor,再进行绘图,如下所示:

1
ggplot(BOD, aes(x=factor(Time), y=demand)) + geom_bar(stat="identity")

颜色填充

以下的代码中,将上述条形图的颜色进行调整,用浅蓝色进行填充,用黑色描边:

1
ggplot(pg_mean,aes(x=group,y=weight))+geom_bar(stat="identity",fill="lightblue",colour="black")

簇状条形图

此例子中用gcookbook包中的cabbage_exp数据集。

1
2
3
4
5
6
7
8
9
head(cabbage_exp) # 查看数据集
## Cultivar Date Weight sd n se
## 1 c39 d16 3.18 0.9566144 10 0.30250803
## 2 c39 d20 2.80 0.2788867 10 0.08819171
## 3 c39 d21 2.74 0.9834181 10 0.31098410
## 4 c52 d16 2.26 0.4452215 10 0.14079141
## 5 c52 d20 3.11 0.7908505 10 0.25008887
## 6 c52 d21 1.47 0.2110819 10 0.06674995

绘图:

1
ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(position="dodge",stat="identity")

mark

注意参数中的position=”dodge”,若无此参数,则Cultivar的两个变量会叠加,dodge意思是“避开”,即添加上此参数,两个变量避开,若无此参数,图像如下所示:

1
ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(stat="identity")

mark

颜色的设置

RColorBrewer包中有各种颜色,如下所示:

1
2
library(RColorBrewer)
ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(position="dodge",stat="identity",colour="red")+scale_fill_brewer(palette="Pastell")

mark

频数条形图:

1
2
ggplot(diamonds,aes(x=cut))+geom_bar()
# 等价于geom_bar(stat="bin")

mark

geom_bar()默认情况下参数为stat=”bin”,当x轴的变量是分类变量,即因子型向量时此函数会自动计算每组变量对应的观测数,如上图所示,如果x是连续型变量,则出现的是直方图,如下图所示:

1
ggplot(diamonds,aes(x=carat))+geom_bar()

mark

对条形图进行上色

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
upc <- subset(uspopchange,rank(Change)>40) # rank()是秩的顺序
upc
## State Abb Region Change
## 3 Arizona AZ West 24.6
## 6 Colorado CO West 16.9
## 10 Florida FL South 17.6
## 11 Georgia GA South 18.3
## 13 Idaho ID West 21.1
## 29 Nevada NV West 35.1
## 34 North Carolina NC South 18.5
## 41 South Carolina SC South 15.3
## 44 Texas TX South 20.6
## 45 Utah UT West 23.8
ggplot(upc,aes(x=Abb,y=Change,fill=Region))+geom_bar(stat="identity")

mark

还可以使用颜色代码进行上色,如下所示:

1
ggplot(upc,aes(x=reorder(Abb,Change),y=Change,fill=Region))+geom_bar(stat="identity",colour="black")+scale_fill_manual(values=c("#669933","#FFCC66"))+xlab("State")

mark

正负条形图上色

1
2
3
csub <- subset(climate,Source=="Berkeley" & Year >= 1900)
csub$pos <- csub$Anomaly10y>=0 # 将正数转化为T,负数转化为F
ggplot(csub,aes(x=Year,y=Anomaly10y,fill=pos))+geom_bar(stat="identity",position="identity")

mark

调节上色顺序

1
2
ggplot(csub,aes(x=Year,y=Anomaly10y,fill=pos))+geom_bar(stat="identity",position="identity",colour="black",size=0.25)+scale_fill_manual(values=c("#CCEEFF","#FFDDDD"),guide=FALSE)
# guide=FALSE消除图例

mark

调节条形图之间的间距

先看一下原始图:

1
2

mark

将间隔改为0.5

1
ggplot(pg_mean,aes(x=group,y=weight))+geom_bar(stat="identity",width=0.5)

mark

将间隔改为1(最大)

1
ggplot(pg_mean,aes(x=group,y=weight))+geom_bar(stat="identity",width=1)

mark

簇状条形图间隔的改变

原始图:

1
ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(stat="identity",width=0.5,position="dodge")

mark

更改:参数是position=position_dodge()

1
ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(stat="identity",width=0.5,position=position_dodge(0.7))

mark

堆积条形图

1
ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(stat="identity")+guides(fill=guide_legend(reverse=TRUE))

mark

注意图例中的变化,guides(fill=guide_legend(reverse=TRUE))

1
ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(stat="identity")+guides(fill=guide_legend(reverse=FALSE))

mark

调整堆叠的顺序

用于plyr包中desc()函数。

1
2
library(plyr)
ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar,order=desc(Cultivar)))+geom_bar(stat="identity")

mark

堆叠图的美化

1
2
library(RColorBrewer)
ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+geom_bar(stat="identity",colour="black")+guides(fill=guide_legend(reverse=TRUE))+scale_fill_brewer(palette="Pastell")

mark

绘制百分比条形图

1
2
ce <- ddply(cabbage_exp,"Date",transform,percent_weight=Weight/sum(Weight)*100)
ggplot(ce,aes(x=Date,y=percent_weight,fill=Cultivar))+geom_bar(stat="identity")

mark

百分比条形图的美化

1
ggplot(ce,aes(x=Date,y=percent_weight,fill=Cultivar))+geom_bar(stat="identity",colour="black")+guides(fill=guide_legend(reverse=TRUE))+scale_fill_brewer(palette="Pastell")

mark

添加数据标签

标签在条形图顶端下方:

1
ggplot(cabbage_exp,aes(x=interaction(Date,Cultivar),y=Weight))+geom_bar(stat="identity")+geom_text(aes(label=Weight),vjust=1.5,colour="white")

mark

标签在条形图顶端上方:

1
ggplot(cabbage_exp,aes(x=interaction(Date,Cultivar),y=Weight))+geom_bar(stat="identity")+geom_text(aes(label=Weight),vjust=-0.2,colour="blue")

mark

调节y轴与标签:

调节y轴的上限

1
ggplot(cabbage_exp,aes(x=interaction(Date,Cultivar),y=Weight))+geom_bar(stat="identity")+geom_text(aes(label=Weight),vjust=-0.2)+ylim(0,max(cabbage_exp$Weight)*1.05)

mark

设定标签的y轴位置使其高于条形图顶端

1
ggplot(cabbage_exp,aes(x=interaction(Date,Cultivar),y=Weight))+geom_bar(stat="identity")+geom_text(aes(y=Weight+0.1,label=Weight))

mark

簇状条形图的标签设置

1
2
3
ggplot(cabbage_exp,aes(x=Date,y=Weight,fill=Cultivar))+
geom_bar(stat="identity",position="dodge")+
geom_text(aes(label=Weight),vjust=1.5,colour="white",position=position_dodge(0.9),size=5)

mark

堆积簇状条形图的标签设置-位于顶端

1
2
3
4
5
6
ce <- arrange(cabbage_exp,Date,Cultivar)
ce <- ddply(ce,"Date",transform,label_y=cumsum(Weight))
# cumsum累积加,例如cumsum(seq(1,10))
ggplot(ce,aes(x=Date,y=Weight,fill=Cultivar))+
geom_bar(stat="identity")+
geom_text(aes(y=label_y,label=Weight),vjust=1.5,colour="white")

mark

堆积簇状条形图的标签设置-位于中央

1
2
3
4
5
6
ce <- arrange(cabbage_exp,Date,Cultivar)
ce <- ddply(ce,"Date",transform,label_y=cumsum(Weight)-0.5*Weight)
ggplot(ce,aes(x=Date,y=Weight,fill=Cultivar))+
geom_bar(stat="identity")+
geom_text(aes(y=label_y,label=Weight),vjust=1.5,colour="white")

mark

堆积簇状条形图的标签设置-添加单位

1
2
3
4
5
6
7
8
ce <- arrange(cabbage_exp,Date,Cultivar)
ce <- ddply(ce,"Date",transform,label_y=cumsum(Weight)-0.5*Weight)
ggplot(ce,aes(x=Date,y=Weight,fill=Cultivar))+
geom_bar(stat="identity",colour="black")+
geom_text(aes(y=label_y,label=paste(format(Weight,nsmall=2),"kg")),size=4)+
guides(fill=guide_legend(reverse=TRUE))+
scale_fill_brewer(palette="Blues")

mark

绘制Cleveland图

基础绘图

用到的是geom_point(),先看一个最基本的图形:

1
2
tophit <- tophitters2001[1:25,]
ggplot(tophit,aes(x=avg,y=name))+geom_point()

mark

排序

上图是用字母顺序来排列的,下面的图是用avg的大小来排列的:

1
2
3
4
5
6
ggplot(tophit,aes(x=avg,y=reorder(name,avg)))+
geom_point(size=3)+
theme_bw()+
theme(panel.grid.major.x=element_blank(),
panel.grid.minor.x=element_blank(),
panel.grid.major.y=element_line(colour="grey60",linetype="dashed"))

mark

x轴与y轴互换

1
2
3
4
5
6
7
ggplot(tophit,aes(x=reorder(name,avg),y=avg))+
geom_point(size=3)+
theme_bw()+
theme(axis.text.x = element_text(angle=60,hjust=1),
panel.grid.major.y=element_blank(),
panel.grid.minor.y=element_blank(),
panel.grid.major.x=element_line(colour="grey60",linetype="dashed"))

mark

火柴杆图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
nameorder <- tophit$name[order(tophit$lg,tophit$avg)]
tophit$name <- factor(tophit$name,levels=nameorder)
> head(tophit)
id first last name year stint team lg g ab r h 2b 3b hr rbi sb cs
1 walkela01 Larry Walker Larry Walker 2001 1 COL NL 142 497 107 174 35 3 38 123 14 5
2 suzukic01 Ichiro Suzuki Ichiro Suzuki 2001 1 SEA AL 157 692 127 242 34 8 8 69 56 14
3 giambja01 Jason Giambi Jason Giambi 2001 1 OAK AL 154 520 109 178 47 2 38 120 2 0
4 alomaro01 Roberto Alomar Roberto Alomar 2001 1 CLE AL 157 575 113 193 34 12 20 100 30 6
5 heltoto01 Todd Helton Todd Helton 2001 1 COL NL 159 587 132 197 54 2 49 146 7 5
6 aloumo01 Moises Alou Moises Alou 2001 1 HOU NL 136 513 79 170 31 1 27 108 5 1
bb so ibb hbp sh sf gidp avg
1 82 103 6 14 0 8 9 0.3501
2 30 53 10 8 4 4 3 0.3497
3 129 83 24 13 0 9 17 0.3423
4 80 71 5 4 9 9 9 0.3357
5 98 104 15 5 1 5 14 0.3356
6 57 57 14 3 0 8 18 0.3314
ggplot(tophit,aes(x=avg,y=name))+
geom_segment(aes(yend=name),xend=0,colour="grey")+
geom_point(size=3,aes(colour=lg))+
scale_colour_brewer(palette="Set1",limits=c("NL","AL"))+
theme_bw()+
theme(panel.grid.major.y=element_blank(),
legend.position=c(1,0.55),
legend.justification=c(1,0.5))

mark

注:order()函数的意思是,把原向量的元素按从小到大排列,输出原来向量的所在位置,而sort()则是把原向量的元素从小到大排列,输出元素值。

1
2
3
4
5
6
7
8
9
10
a<-c(3,9,0,12,19)
sort(a) # sort(); 输出排序后的结果
## [1] 0 3 9 12 19
order(a) # 输出排序后的各个向量位置
## [1] 3 1 2 4 5
# 3表示原向量中第3个元素排在第1位,1的意思是原向量中第1个排在第2位,

以队为分组变量进行分面

1
2
3
4
5
6
7
ggplot(tophit,aes(x=avg,y=name))+
geom_segment(aes(yend=name),xend=0,colour="grey50")+
geom_point(size=3,aes(colour=lg))+
scale_colour_brewer(palette="Set1",limits=c("NL","AL"),guide=FALSE)+
theme_bw()+
theme(panel.grid.major.y=element_blank())+
facet_grid(lg~.,scales="free_y",space="free_y")

mark

参考资料

  1. 常肖楠, 邓一硕, 魏太云. R数据可视化手册[M]. 人民邮电出版社, 2014.